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ABSTRACT 

Today’s psychological measurement depends almost 
exclusively on the ’’standardized test.” A certain amount of 
non-standardization, however, exists in the administration of any 
standardized test, with the amount unknown for any given test score. 
Time limits on tests pose a bigger problem since another variable is 
introduced, pressure. Test taking motivation must also be considered. 
The test could be too easy or too difficult, thus boring or 
frustrating the individual. Reliability is also a difficulty, since 
there is no true reliability computed for an individual. Proper 
application of computer technology permits a solution to many of the 
problems raised by standardized tests. The tests would be 
individualized, with items of known difficulty grouped or stratified 
bv level of difficulty. The testing situation could be tailored to 
fit an individual’s preferences and/or abilities and disabilities. 
Administrative fluctuations and test taking motivation could be 
eliminated. Individualized item sequence would tailor the test to the 
individual, as far as difficulty is concerned. Through the item 
sequence, reliability would ^become more accurate, as the computer 
could more exactly pinpoint levels of difficulty. (KJ) 
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While the field of psychometrics has made steady progress in 
many technical aspects during the last thirty years or so, until 
now there has been no major technological innovation which has 
promise of freeing psychometrics from many current problems. The 
recent development and expansion of computer technology promises 
us a method which could have profound effects on both psychometric 
theory and practice. 

The "standardized test” 

Today's psychological measurement depends almost exclusively on the 
'•'standardized test." Standardization Is the process of developing 
a common Item sequence, a standard mode of administration and In many 
cases, a "fair" set of time limits. The objective is to present each 
Individual with the same set of stimulus variables, and to measure the 
underlying ability by comparing his responses to this structured set 
with the responses of other Individuals. The relative ability levels 
of two Individuals are then determined by the differences In the 
number of items answered correctly, as compared to some norm group. 

Non*standard administration , ^^ile the idea of standardization 
was a major achievement toward solving some of the earlier problems 
In psychological measurement. It, In turn, raised many other problems 
which have rarely been confronted. First, while the physical stimulus 
complex represented by the test booklet was standardized, many of the 
other variables surrounding test administration were not. Thus, 
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differences In test administrator, In terms of administrator's sex, 
race, or committment to the task could not be standardized. 

Research on these variables frequently shows differences In group 
means when groups are tested by different administrators. It can 
frequently be assumed, therefore, that a certain amount of non- 
standardization exists In the administration of any standardized 
test, with the amount unknown for any given test score. Care must 
then be exercised In the Interpretation and use of scores when 
conditions of administration are not clearly specified. 

Time limits . A more serious problem In the use of standardized 
tcists Is In the Imposition of time limits. Tests are usually timed 
for convenience of administration. Most psychometricians would 
probably agree that a "power'* test Is more relevant to measuring most 
kinds of abilities, and that the majority of criteria that we are 
trying to predict from ability tests are not heavily speeded. Yet most 
ability tests are built with time limits that force Individuals to 
pace themselves unnecessarily. Time limits may penalize the slower, 
but more accurate and capable Individual, while benefittlng the faster 
Individual who may have less of the ability being measured than the 
slower, more methodical, person. We have tried to correct for this 
situation In at least two ways: first, we develop time limits that 

permit 95% of the examinees to finish the test. This procedure Is a 
« relatively good solution, but It still may penalize the 5% of Individuals 

who do not finish. In addition, it brings Into the measurement 
situation extraneous personality variables which are unrelated to the 
ability being measured. Such variables Include the ability to 
work under time pressure and the propensity to react to time pressure 
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stimulation. While perhaps there are few important differences In 
these two personality characteristics within the white middle class 
population, application of timed tests to other groups, such as the 
American Indian, the Black American, or the Oriental American many yield 
results that are invalid because of cultural differences In the propensity 
to react to time pressures as a source of motivation. 

We have also tried to correct for the differential Influence of 
time pressures by adopting a "correction for guessing." In this way, 
the individual who reacts quickly to test Items and guesses freely Is 
penalized by adjusting his score downward on the basis of the number of 
Items he answered . correctly or Incorrectly. Even though the 
different adjustment formulas yield different results, the procedure 
has some merit for the fast Individual who tends to.gmnsB, But ws 
still have not found a way to perform an upward correction on the 
scores of the slower individual who does not guess. Granted, the 
differences between the fast and slow Individual are reduced by the 
correction for guessing, but we still have no way of knowing how 
capable the slow Individual really Is. Our only solution Is a 
completely non-speeded test, but In many cases such a measurement 
procedure is virtually impossible. This Is especially true In the 
vocational assessment procedure where we wish to measure an 
Individual's capacities on as many as ten or more Independent 
abilities. The use of pure power tests would probably require several 
days of testing time for each person, with the consequent reduction 
In test-taking motivation that Is likely to result from Intensive 
testing. 

Test- taking motivation . This leads us to the third major 
problem raised by the use of standardized tests. Because the 
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standardlzed test uses a coiamon item sequence for every individual, 

It Is wasteful of an Individual's time. But, more important, it can 
have unmeasurable effects on test scores by affecting the individual's 
test-taking motivation. For some individuals, the standardized test 
is too easy. As a result, these individuals may get bored with answering 
a series of items which are not challenging enough to motivate them 
to high performance. The test may then be considered a ”stupld waste 
of time" with consequent effects on the person's test scores. These 
Invalid scores may then be used in institutional decisions relative 
to that individual with consequent detrimental effects on his freedom 
of choice. 

For other individuals, a standardized test may be too difficult. 

In this case the individual is presented with a series of items that 
he may find utterly frustrating. While parts of the test may Include 
easier items which he could validly answer, the individual may develop 
a negative attitude toward the test and cease answering out of frustration. 
While there are obviously individual differences in the reactions to 
such a situation, at the present time we have no way to separate those 

I 

individuals who obtain a low score due to frustration from those whose 
low score is the result simply of lack of ability on the dimension 
measured by the test. While various test design formats have 
been developed which have implications for this problem, such as the 
"spiral omnibus" form of item arrangement, presentation of a set 
of items In a given order on a printed page is no guarantee that an 
individual will answer them in that order, nor can we gauge his 
reactions to the total stimulus complex, in order to estimate its 



effects on his test scores. 
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Riftliability « A related problem that avismt in the use of 
standardized tests is the Interpretation of the reliability of the 
test score, or the accuracy or precision that can be attached to a 
given measurement. Since the standardized test is frequently built 
for maximum discrimination in the middle ranges of the variable 
being measured, measurements at the extreme are usually of differing 
reliability than those in the middle range of ability. This results 
partially from the fact that there are more items at the middle 
ranges; hence, the accuracy of measurement is greatest as a result 
of the larger number of items. At the high and low extremes there are 
frequently fewer items, thus reducing accuracy. The lowered accuracy 
at the extremes based simply on the number of items is, of course, further 
complicated by some of the other variables previously mentioned. 

While the nature of the standardized test thus affects the accuracy 
or precision of measurement, we also have problems in the estimation 
of reliability. The most relevant estimate of reliability of an . .. 
individual's test score is an error band around the score showing the 
degree of confidence that can be placed in interpretation and use of 
the obtained measurement. Reliability theory tells us to compute the 
standard error of measurement and use some function of it as our "error 
band." But the standard error of measurement is a group statistic. 

As such its value varies as a function of the reliability coefficient 
for a group of individuals, and the standard deviation of the test 
scores for that group. The obtained standard error of measurement 
figure is therefore applicable to the average member of a given group. 

We know that both reliability coefficients and standard deviations 

/ 

of test scores vary from group to group. We constantly tell our 
students that "there is no one reliability for a test", that 









"reliability is for a given measurement." But, if one day an 
individual is a member of one group and another day a member of einother 
group, it is perfectly possible that two identical measurements 
obtained on the same test for that Individual will have different 
accuracy, as measured by the standard error of measurement. This may 
result simply because of different reliabilities and/or variabilities 
for the two groups. Yet there Is no way to know which "error band" 

Is the true one for that Individual. Thus, the standardized test has 
given us a method for estimating reliability which does not permit 
an accurate reliability estimate of a single measurement taken on one 
Individual; rather the accuracy of a given measurement varies artificially 
a result of the group an individual happens to be tested with. 

The computer-based assessment system 

Proper application of computer technology permits a solution to 
many of these problems raised by the use of standardized tests. By 
appropriate redesign of our testing Instruments, computers can be 
programmed to administer psychological tests In an "Individualized" or 
"tailored" (Lord, 1968, 1969) faphlon^. Rather than requiring the Individual 
to adapt himself to a standardized test, the computer can be programmed 
to adapt the tests to the characteristics of the Individual, or to 
"Individualize" the testing procedure. 

The basic system , individualized assessment assumes a large Item 
pool for each ability to be measured. The optimal system would require 
Items to be of known difficulty level, with items grouped or stratified 
according to difficulty level. At each difficulty level there would 

^Computer-based assessment procedures have also been referred to as 
"branching" tests (Bayroff and Seeley, 1967), programmed tests (Cleary, 

Linn and Rock, 1968; Linn, Rock and Cleary, 1969) and"sequentlal" 
tests (Cronbach and Gleser, 1965). 



be as many as 25 or 50 or more items in storage, each Item known to be 
measuring the same variable at the same level* These items are then 
stored in the computer and Identified by both dimension and difficulty 
level. The Items are accessed by a program which controls the testing 
procedure. 



The individual to be tested may appear for testing at any time 
the computer Is free. He sits down at a control panel which can 
include a variety of Input-output devices. This variety of devices 
for presenting Items and recording responses is the first major 
advantage of this type of measurement. 

Individualised Input . The typical individualized measurement system 
will likely have items presented on a cathode ray tube or CRT. In 
many cases the Individual will respond by touching a light-pen to the 
correct answer. But for some people, such as the physically disabled 
with motor problems or eye-hand coordination problems, responses may be 
recorded on a typewriter unit, a series of foot pedals, large push- 
buttons, or, within the next few years, by a computer-driven volcei- 
wrlter. For those people with visual problems, the items can be 
presented double-sized on the cathode ray screen, or projected via 
computer-driven slide projectors onto movie screens in two foot 
letters. For the totally blind. Items could be presented aurally. 



via computer-driven random access tape recorders. A variety of other 
input-output devices could be developed to tailor the testing sit- 
uation to the individual's preferences and/or abilities or disabilities. 
Input-output devices can also be varied to maintain an interest in 
the testing procedure; such flexibility might be particularly 
effective with children. 
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The choice of communication devices as vrell as subsequent 
choice of Items can be under computer control. Given the computer's 
capacity to store relevant Information on an Individual, the 
Individual would simply be required to Input his name or Identification 
number to begin the testing process. All subsequent decisions based 
on this piece of Information could be under program control. For 
example, our system will not require an Individual to complete 
any one test at one sitting. The subject will be permitted to leave 
the testing room at any time and return at any time. Input of his 
name at subsequent sessions allows the computer to Immediately "recall" 
all ptevioui responses and to continue exactly at the point at wlilch 
he left off, whether the Interval be three minutes or three months. 

In this way, we will not force an Individual to take a test under 
non-optlmal conditions of health and/or motivation. In this way we 
can eliminate administrator effects and possibly reduce fluctatlons 
In scores resulting from some aspects of test-taking motivation 
as well as other factors usually affecting measurement accuracy. 

Individualised Item sequences . The Individualized assessment 
system can be designed to eliminate or minimize many of the other 
problems resulting from the use of standardized tests. The test 
administration program can be designed to start every Individual 
at an Item of middle range difficulty or at some estimated ability 
level, based on other Information available prior to testing. If 
the Individual gets the Item correct, the next Item to be presented 
will be one of higher difficulty. Each correct response leads to a 
more difficult Item, until a wrong response occurs. At that point, 
the program then chooses an Item lower In difficulty than the lowest 







item answered correctly. The effect of this procedure Is to keep the 
test at a relevant level for the individual. If this item is answered 
correctly, the computer can proceed to items of more difficulty as before. 
Or, it can be programmed to alternate between difficult and easy items 
In some systematic fashion. There are an endless variety of procedures 
to follow at this point. The objective, of course. Is to maintain the 
individual's Interest In the task and motivation to continue, by 
presenting items which are not too easy for him nor too difficult. 

The objective would be similar to that developed by Blnet 60 years 
ago. We use the computer to find the level of difficulty at which 
the Individual gets all the items correct, and the level of difficulty 
at which he gets all the Items wrong. Having found this "lower 
shelf" and "upper shelf" we then. In some systematic fashion fill 
in the spaces between In an attempt to pinpoint the Individual's capacity 
in terms of highest difficulty possible for him. We differ from the 
original Blnet procedure In that we are measuring on a unldlmens lonal 
variable, rather than on undefined global "Intelligence", and that 
the Items are presented by computer rather than a human psychometrlst, 
with all the attendant Interpersonal contamination factors. 

Individualized precision of measurement . Given the fact that 
we have In storage a large number of Items at each difficulty 
level, following Identification of the Individual's upper and lower 
shelves, we can then concentrate all further measurement within 
the area that Is relevant for that Individual. At this point we 
have an opportunity to clarify reliability of measurement for one 
individual, labile the upper and lower shelves give us a gross 
estimate of the maximum level of difficulty of which the Individual 
is capable, the computer can then present Items at the levels In 
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betveen to obtain a revised, more accurate, maximum level within 
certain limits of accuracy. The reliability or precision of the 
obtained measurements can be controlled by the Investigator and 
varied according to the purpose for which the measurement is to be 
obtained. 

Under this system of measurement, a person’s score on the test 
is the maximum difficulty level reached within a certain probability 
of error. If there were fifty Items at a given difficulty level, 
with five choices each, we could assume that ten of these would be 
answered correctly purely by chance. Given the fact that an Indivi- 
dual answers 25 of these 50 correctly at difficulty level X, and that 
he answers 8 of 50 correctly at difficulty level X+1, and further that 
he answers 42 of 50 correctly at difficulty level X-1, we can be 
fairly confident that his "true" ability level lies at difficulty 
level X. We can be even more confident In the accuracy of that 
score by presenting larger numbers of items at each of the relevant 
levels. This procedure permits us to narrow our lower and upper 
shelves to converge on the Individual's £d)lllty level within the 
required degree of accuracy. This same process can be reoeated 
on other ability dimensions for the same individual, with the degree 
of accuracy on that dimension relevant to the decision to be made 
on that piece of Information. 

The individualized assessment system would also permit us to 
develop and use tests of varying levels of accuracy, in terms of hoi 7 
finely separated the component difficulty levels are. We could 
develop some gross screening- type Instruments for measuring second- 
or thlrd^^order abilities, then proceed to the finer measuring 
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Instruments within only those gross levels that are indentlfied as 
high or low for each Individual « This would permit us to measure 
a wide variety of abilities on each Individual in a minimum amount 
of time to pre-determlned levels of accuracy. 

Other advantages of Individualized meaurement . There are other 
aspects of an Individualized system which hold promise for applications 
of psychometrics. Primary among these is the ability of the system 
to Increase "motivation” by appropriate methods of feedback. The 
computer can inform an individual, in a variety of ways, whether his 
answer was right or wrong. It can use flashing lights, printed words, 
verbal reinforcement or food pellets dropped down a slide. We can in- 
crease motivation by tailoring the reinforcement to the individual; 
not all Individuals are reinforced by knowledge of results. Individuals 
from different sub-cultures may require different kinds of reinforcing 
stimuli. For some, such as children, food or food-chip equivalents 
might be relevant. For some individuals a form of reinforcement 
may occur by varying systematically the inter-mix of items from various 
ability domains. Other ways of motivating individuals will undoubtedly 
by developed as Individualized measurement systems become operational. 

Computer-based assessment systems are obviously not bound to 
time limits. Rather, the computer can record the amount of time 
it takes for an individual to answer a given item. This information, 
in conjunction with Information on whether each response was correct 
can be combined into indices of reliability. In addition, it would 
seem possible that judicious use of time latency measures could 
assist in separating out responses that are "guessed" vs. those 
that are obviously known immediately by the individual. Such 
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Information can assist In helping to norm Items, as t/ell as In 
Interpreting the results of computer-based assessment* 

An additional possibility Is the use of the computer to, finally, 
measure the ability to learn, or what we have called ’’aptitude" all 
these years. We can do this by measuring an individual's status 
on a given ability, then present a learning situation relevant to that 
ability (Including knowledge of results), then finally a post-test. 

The difference between pre-test and post-test, with Interim learning 
held constant, could be a measure of "ability to learn" on a given 
dimension. Such a procedure would seem to permit us to separate 

(measured status at one point In time) from aptitude (capacity 
profit from learning) . This approach would seem to havp- primary 
relevance to measurement with the "disadvantaged". 

While computer-based assessment may seem relatively expensive, 
the cost of computer hardware Is continually decreasing. Within ten 
years computers are likely to be as readily available as calculating 
machines. Host high schools, colleges, counseling agencies, employ- 
ment agencies, clinics and personnel departments will have computers 
available to them. The measurement systems can be designed to 
operate on virtually any computer and on many, can operate simulta- 
neously with scientific and business data processing. Computer-based 
assessment has promise for helping use solve some of the Importnat 
problems In psychological measurement. A good system of psychometrics 
can assist In the solution of many of the problems of today's society, 
partlcularlly In the Identification of new sources of talent in 
untapped segments of our society. 

Such a system Is now under development at the Work Adjustment 
Project of the University of Minnesota. We will continue to explore 
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its Implications for both the theory and practice of measurement 
as well as its Implications for some of the vocationally-relevant 
problems of society. 
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